```{r} blocksRepeated measures for 11 individuals, mean (sd)
| Round | Duration | Number Correct |
|---|---|---|
| 1 | 7.5 | 9.0 |
| (2.0) | (3.3) | |
| 2 | 7.5 | 9.0 |
| (2.0) | (3.3) | |
| 3 | 7.5 | 9.0 |
| (2.0) | (3.3) | |
| 4 | 7.5 | 9.0 |
| (2.0) | (3.3) |
Regression of Duration on Number Correct repeated for each round
| Round | Term | Estimate | SE |
|---|---|---|---|
| 1 | (Intercept) | 3.0 | 1.12 |
| num.correct | 0.5 | 0.12 | |
| 2 | (Intercept) | 3.0 | 1.13 |
| num.correct | 0.5 | 0.12 | |
| 3 | (Intercept) | 3.0 | 1.12 |
| num.correct | 0.5 | 0.12 | |
| 4 | (Intercept) | 3.0 | 1.12 |
| num.correct | 0.5 | 0.12 |
ggplot2
dplyr, reshape2, and tidyr
A visual display that illustrates one or more relationships among numbers…a shorthand means of presenting information that would take many more words and numbers to describe.
—Stephen M. Kosslyn. Graph Design for the Eye and Mind. Oxford University Press, 2006
It depends on the goal:
ggplot2 do much of this work for youdata.frametest_data
## # A tibble: 44 × 4
## round respondent num.correct duration
## <fctr> <fctr> <dbl> <dbl>
## 1 1 1 10 8.04
## 2 1 2 8 6.95
## 3 1 3 13 7.58
## 4 1 4 9 8.81
## 5 1 5 11 8.33
## 6 1 6 14 9.96
## 7 1 7 6 7.24
## 8 1 8 4 4.26
## 9 1 9 12 10.84
## 10 1 10 7 4.82
## # ... with 34 more rows
data.frame) for every layermy_plot <- ggplot(data = test_data, mapping = aes(x = duration,
y = num.correct))
aes() is used to create a list of aesthetic mappings
x refers to the graph’s x-axis, y to the y-axisduration \(\rightarrow\) x-axisnum.correct \(\rightarrow\) y-axismy_plot now represents a ggplot object set to our defaultsdata comes first, mapping comes secondmy_plot <- ggplot(test_data, aes(x = duration, y = num.correct))
print(my_plot)
+ operator to combine ggplot elementsmy_plot + geom_point()
print() call, so the following two lines are equivalent:
my_plot + geom_point()
print(my_plot + geom_point())
my_plot + geom_point()
my_plot + geom_line()
my_plot + geom_point() + geom_line()
identity function, \[f(x)=x\] That is, the data are left unchangedgeom_point and geom_line is identity so these plots show the data as isgeom_histogram is a binning function (called stat_bin)ggplot(test_data, aes(x = duration)) + geom_histogram(binwidth = 2)
## # A tibble: 44 × 4
## round respondent num.correct duration
## <fctr> <fctr> <dbl> <dbl>
## 1 1 1 10 8.04
## 2 1 2 8 6.95
## 3 1 3 13 7.58
## 4 1 4 9 8.81
## 5 1 5 11 8.33
## 6 1 6 14 9.96
## 7 1 7 6 7.24
## 8 1 8 4 4.26
## 9 1 9 12 10.84
## 10 1 10 7 4.82
## # ... with 34 more rows
## # A tibble: 5 × 2
## x y
## <dbl> <dbl>
## 1 4 4
## 2 6 13
## 3 8 20
## 4 10 5
## 5 12 2
| Item | Default stat/geom |
|---|---|
geom_point |
stat_identity (\(f(x)=x\)) |
geom_line |
stat_identity (\(f(x)=x\)) |
geom_histogram |
stat_bin (binning) |
geom_smooth |
stat_smooth (regression) |
stat_smooth |
geom_smooth (line + ribbon) |
stat_bin |
geom_bar (vertical bars) |
stat_identity |
geom_point (dots) |
ggplot(test_data, aes(x = duration)) + stat_bin(binwidth = 1)
ggplot(test_data, aes(x = duration)) + geom_histogram(binwidth = 1)
ggplot(test_data, aes(x = round,
y = duration)) + geom_point()
ggplot(test_data, aes(x = round,
y = duration)) + geom_boxplot()
| Item | Required | Optional |
|---|---|---|
geom_point |
x, y |
alpha, colour, fill, shape, size, stroke |
geom_line |
x, y |
alpha, colour, linetype, size |
geom_pointrange |
x, ymax, ymin |
alpha, colour, linetype, size |
my_plot + geom_point(
mapping = aes(colour = round))
my_plot + geom_point(
colour="red")
identity meaning don’t do anything specialstack or dodgeg <- ggplot(test_data, aes(x = num.correct, fill = round))
g + stat_bin(binwidth = 4,
position = 'stack')
g + stat_bin(binwidth = 4,
position = 'dodge')
Cmd-Enter (Mac) or Control-Enter (Windows)mpg which is included in the ggplot2 packagehttp://goo.gl/Gx5LAKlibrary(ggplot2) #library(tidyverse)
?mpg
This dataset contains a subset of the fuel economy data that the EPA makes available on http://fueleconomy.gov. It contains only models which had a new release every year between 1999 and 2008 - this was used as a proxy for the popularity of the car.
mpg
A data frame with 234 rows and 11 variables
model name
engine displacement, in litres
year of manufacture
number of cylinders
type of transmission
f = front-wheel drive, r = rear wheel drive, 4 = 4wd
city miles per gallon
highway miles per gallon
fuel type
“type” of car
mpg
## # A tibble: 234 × 11
## manufacturer model displ year cyl trans drv cty hwy
## <chr> <chr> <dbl> <int> <int> <chr> <chr> <int> <int>
## 1 audi a4 1.8 1999 4 auto(l5) f 18 29
## 2 audi a4 1.8 1999 4 manual(m5) f 21 29
## 3 audi a4 2.0 2008 4 manual(m6) f 20 31
## 4 audi a4 2.0 2008 4 auto(av) f 21 30
## 5 audi a4 2.8 1999 6 auto(l5) f 16 26
## 6 audi a4 2.8 1999 6 manual(m5) f 18 26
## 7 audi a4 3.1 2008 6 auto(av) f 18 27
## 8 audi a4 quattro 1.8 1999 4 manual(m5) 4 18 26
## 9 audi a4 quattro 1.8 1999 4 auto(l5) 4 16 25
## 10 audi a4 quattro 2.0 2008 4 manual(m6) 4 20 28
## # ... with 224 more rows, and 2 more variables: fl <chr>, class <chr>
x mapped to ctyy mapped to hwypoint geometryidentity statidentity positionhttp://goo.gl/Gx5LAKcolour, shape, or sizefacetg <- ggplot(mpg, aes(x = displ, y = hwy))
g + geom_point(aes(colour = drv))
g + geom_point() + facet_wrap(~drv)
colour, shape, or size, ggplot2 automatically maps those variables to groupgroup aesthetic controls how collections of items are rendered
geom_line the group aesthetic determines which points will be connected by a continuous linestat_summary the group aesthetic determines which points are summarised by a common statisticv is continuous but you want to use it for grouping, either specificy group = v or transform it into a discrete variable, e.g., colour = factor(v)ggplot(mpg, aes(x = displ, y = hwy,
colour=cyl)) +
geom_point() + geom_smooth()
## `geom_smooth()` using method = 'loess'
ggplot(mpg, aes(x = displ, y = hwy,
colour=factor(cyl))) +
geom_point() + geom_smooth()
## `geom_smooth()` using method = 'loess'
aes(group=1) when creating a layerggplot(mpg, aes(x = displ, y = hwy, colour = factor(cyl))) +
geom_point() + geom_smooth(aes(group = 1))
## `geom_smooth()` using method = 'loess'
ggplot(mpg, aes(x = displ, y = hwy)) + geom_point() +
scale_y_log10(breaks = c(15, 30, 45))
ggplot(mpg, aes(x = displ, y = hwy, colour = drv)) + geom_point() +
scale_y_log10(breaks = c(15, 30, 45)) + labs(x = "Displacement (litres)",
y = "Highway miles per gallon (log scale)",
colour = "Drive train",
title = "Engine size and fuel consumption")
ggplot(mpg, aes(x = displ, y = hwy, colour = plyr::revalue(drv,
c(f = "Fore", r = "Rear", `4` = "4WD")))) + geom_point() +
labs(colour = "Drive train")
ggplot(mpg, aes(x = displ, y = hwy)) + geom_point() +
facet_wrap(~drv, labeller = as_labeller(c(f = "Fore",
r = "Rear", `4` = "4WD")))
library(dplyr)
dplyr package makes these steps fast and easy:
Source: Introduction to dplyr vignette
(e <- exp(1))
## [1] 2.718282
log(e)
## [1] 1
Usage: log(x, base = exp(1))
e %>% log
## [1] 1
e %>% log()
## [1] 1
e %>% log(.)
## [1] 1
e %>% log(2)
## [1] 1.442695
e %>% log(base = 2)
## [1] 1.442695
e %>% log(., base = 2)
## [1] 1.442695
Little bunny Foo Foo
Went hopping through the forest
Scooping up the field mice
And bopping them on the head
bop(
scoop(
hop(foo_foo, through = forest),
up = field_mice
),
on = head
)
foo_foo %>%
hop(through = forest) %>%
scoop(up = field_mouse) %>%
bop(on = head)
select
rename
mutate
arrange
summarise
group_by
d %>% select(cty, hwy)
| cty | hwy | cyl | displ |
|---|---|---|---|
| 11 | 17 | 6 | 3.3 |
| 20 | 26 | 4 | 2.5 |
| 11 | 15 | 8 | 4.6 |
| 17 | 24 | 6 | 3 |
| cty | hwy |
|---|---|
| 11 | 17 |
| 20 | 26 |
| 11 | 15 |
| 17 | 24 |
d %>% select(starts_with("c"))
| cty | hwy | cyl | displ |
|---|---|---|---|
| 11 | 17 | 6 | 3.3 |
| 20 | 26 | 4 | 2.5 |
| 11 | 15 | 8 | 4.6 |
| 17 | 24 | 6 | 3 |
| cty | cyl |
|---|---|
| 11 | 6 |
| 20 | 4 |
| 11 | 8 |
| 17 | 6 |
d %>% select(highway = hwy, everything(), -cyl)
| cty | hwy | cyl | displ |
|---|---|---|---|
| 11 | 17 | 6 | 3.3 |
| 20 | 26 | 4 | 2.5 |
| 11 | 15 | 8 | 4.6 |
| 17 | 24 | 6 | 3 |
| highway | cty | displ |
|---|---|---|
| 17 | 11 | 3.3 |
| 26 | 20 | 2.5 |
| 15 | 11 | 4.6 |
| 24 | 17 | 3 |
d %>% rename(highway = hwy)
| cty | hwy | cyl | displ |
|---|---|---|---|
| 11 | 17 | 6 | 3.3 |
| 20 | 26 | 4 | 2.5 |
| 11 | 15 | 8 | 4.6 |
| 17 | 24 | 6 | 3 |
| cty | highway | cyl | displ |
|---|---|---|---|
| 11 | 17 | 6 | 3.3 |
| 20 | 26 | 4 | 2.5 |
| 11 | 15 | 8 | 4.6 |
| 17 | 24 | 6 | 3 |
d %>% mutate(z = hwy/cty)
| cty | hwy | cyl | displ |
|---|---|---|---|
| 11 | 17 | 6 | 3.3 |
| 20 | 26 | 4 | 2.5 |
| 11 | 15 | 8 | 4.6 |
| 17 | 24 | 6 | 3 |
| cty | hwy | cyl | displ | z |
|---|---|---|---|---|
| 11 | 17 | 6 | 3.3 | 1.545455 |
| 20 | 26 | 4 | 2.5 | 1.3 |
| 11 | 15 | 8 | 4.6 | 1.363636 |
| 17 | 24 | 6 | 3 | 1.411765 |
d %>% mutate(sqrt(displ))
| cty | hwy | cyl | displ |
|---|---|---|---|
| 11 | 17 | 6 | 3.3 |
| 20 | 26 | 4 | 2.5 |
| 11 | 15 | 8 | 4.6 |
| 17 | 24 | 6 | 3 |
| cty | hwy | cyl | displ | sqrt(displ) |
|---|---|---|---|---|
| 11 | 17 | 6 | 3.3 | 1.81659 |
| 20 | 26 | 4 | 2.5 | 1.581139 |
| 11 | 15 | 8 | 4.6 | 2.144761 |
| 17 | 24 | 6 | 3 | 1.732051 |
d %>% arrange(cty, hwy)
| cty | hwy | cyl | displ |
|---|---|---|---|
| 11 | 17 | 6 | 3.3 |
| 20 | 26 | 4 | 2.5 |
| 11 | 15 | 8 | 4.6 |
| 17 | 24 | 6 | 3 |
| cty | hwy | cyl | displ |
|---|---|---|---|
| 11 | 15 | 8 | 4.6 |
| 11 | 17 | 6 | 3.3 |
| 17 | 24 | 6 | 3 |
| 20 | 26 | 4 | 2.5 |
d %>% arrange(desc(cty), hwy)
| cty | hwy | cyl | displ |
|---|---|---|---|
| 11 | 17 | 6 | 3.3 |
| 20 | 26 | 4 | 2.5 |
| 11 | 15 | 8 | 4.6 |
| 17 | 24 | 6 | 3 |
| cty | hwy | cyl | displ |
|---|---|---|---|
| 20 | 26 | 4 | 2.5 |
| 17 | 24 | 6 | 3 |
| 11 | 15 | 8 | 4.6 |
| 11 | 17 | 6 | 3.3 |
d %>% filter(cty == 11)
| cty | hwy | cyl | displ |
|---|---|---|---|
| 11 | 17 | 6 | 3.3 |
| 20 | 26 | 4 | 2.5 |
| 11 | 15 | 8 | 4.6 |
| 17 | 24 | 6 | 3 |
| cty | hwy | cyl | displ |
|---|---|---|---|
| 11 | 17 | 6 | 3.3 |
| 11 | 15 | 8 | 4.6 |
d %>% filter(hwy/cty > 1.4)
| cty | hwy | cyl | displ |
|---|---|---|---|
| 11 | 17 | 6 | 3.3 |
| 20 | 26 | 4 | 2.5 |
| 11 | 15 | 8 | 4.6 |
| 17 | 24 | 6 | 3 |
| cty | hwy | cyl | displ |
|---|---|---|---|
| 11 | 17 | 6 | 3.3 |
| 17 | 24 | 6 | 3 |
d %>% summarise(hwy = mean(hwy), cty = mean(cty))
| cty | hwy | cyl | displ |
|---|---|---|---|
| 11 | 17 | 6 | 3.3 |
| 20 | 26 | 4 | 2.5 |
| 11 | 15 | 8 | 4.6 |
| 17 | 24 | 6 | 3 |
| hwy | cty |
|---|---|
| 20.5 | 14.75 |
d %>% summarise_each(funs(mean))
| cty | hwy | cyl | displ |
|---|---|---|---|
| 11 | 17 | 6 | 3.3 |
| 20 | 26 | 4 | 2.5 |
| 11 | 15 | 8 | 4.6 |
| 17 | 24 | 6 | 3 |
| cty | hwy | cyl | displ |
|---|---|---|---|
| 14.75 | 20.5 | 6 | 3.35 |
With summarise…
d %>% group_by(cyl) %>% summarise_each(funs(mean))
| cty | hwy | cyl | displ |
|---|---|---|---|
| 11 | 17 | 6 | 3.3 |
| 20 | 26 | 4 | 2.5 |
| 11 | 15 | 8 | 4.6 |
| 17 | 24 | 6 | 3 |
| cyl | cty | hwy | displ |
|---|---|---|---|
| 4 | 20 | 26 | 2.5 |
| 6 | 14 | 20.5 | 3.15 |
| 8 | 11 | 15 | 4.6 |
d %>% group_by(cty) %>% summarise(mean(hwy), n())
| cty | hwy | cyl | displ |
|---|---|---|---|
| 11 | 17 | 6 | 3.3 |
| 20 | 26 | 4 | 2.5 |
| 11 | 15 | 8 | 4.6 |
| 17 | 24 | 6 | 3 |
| cty | mean(hwy) | n() |
|---|---|---|
| 11 | 16 | 2 |
| 17 | 24 | 1 |
| 20 | 26 | 1 |
With mutate…
d %>% group_by(cyl) %>% mutate(max(hwy))
| cty | hwy | cyl | displ |
|---|---|---|---|
| 11 | 17 | 6 | 3.3 |
| 20 | 26 | 4 | 2.5 |
| 11 | 15 | 8 | 4.6 |
| 17 | 24 | 6 | 3 |
| cty | hwy | cyl | displ | max(hwy) |
|---|---|---|---|---|
| 11 | 17 | 6 | 3.3 | 24 |
| 20 | 26 | 4 | 2.5 | 26 |
| 11 | 15 | 8 | 4.6 | 15 |
| 17 | 24 | 6 | 3 | 24 |
d %>% group_by(cty) %>% mutate(displ = displ - mean(displ))
| cty | hwy | cyl | displ |
|---|---|---|---|
| 11 | 17 | 6 | 3.3 |
| 20 | 26 | 4 | 2.5 |
| 11 | 15 | 8 | 4.6 |
| 17 | 24 | 6 | 3 |
| cty | hwy | cyl | displ |
|---|---|---|---|
| 11 | 17 | 6 | -0.65 |
| 20 | 26 | 4 | 0 |
| 11 | 15 | 8 | 0.65 |
| 17 | 24 | 6 | 0 |
e %>% group_by(manufacturer, model) %>% summarise(cty = mean(cty),
n = n()) %>% filter(cty == max(cty)) %>% rename(max_cty = cty)
| manufacturer | model | cty |
|---|---|---|
| audi | a4 | 18 |
| audi | a4 | 21 |
| audi | a4 | 20 |
| audi | a4 | 21 |
| audi | a4 | 16 |
| audi | a4 | 18 |
| audi | a4 | 18 |
| audi | a4 quattro | 18 |
| audi | a4 quattro | 16 |
| audi | a4 quattro | 20 |
| audi | a4 quattro | 19 |
| audi | a4 quattro | 15 |
| audi | a4 quattro | 17 |
| … | … | … |
| manufacturer | model | max_cty | n |
|---|---|---|---|
| audi | a4 | 18.85714 | 7 |
| chevrolet | malibu | 18.80000 | 5 |
| dodge | caravan 2wd | 15.81818 | 11 |
| ford | mustang | 15.88889 | 9 |
| honda | civic | 24.44444 | 9 |
| hyundai | sonata | 19.00000 | 7 |
| jeep | grand cherokee 4wd | 13.50000 | 8 |
| land rover | range rover | 11.50000 | 4 |
| lincoln | navigator 2wd | 11.33333 | 3 |
| … | … | … | … |
library(tidyr)
e %>% separate(trans, c("type", "detail"), sep = "[\\(\\)]",
extra = "drop", remove = TRUE)
| model | year | trans |
|---|---|---|
| a4 | 1999 | auto(l5) |
| a4 | 1999 | manual(m5) |
| a4 | 2008 | manual(m6) |
| a4 | 2008 | auto(av) |
| a4 quattro | 1999 | manual(m5) |
| a4 quattro | 1999 | auto(l5) |
| a4 quattro | 2008 | manual(m6) |
| a4 quattro | 2008 | auto(s6) |
| a6 quattro | 1999 | auto(l5) |
| … | … | … |
| model | year | type | detail |
|---|---|---|---|
| a4 | 1999 | auto | l5 |
| a4 | 1999 | manual | m5 |
| a4 | 2008 | manual | m6 |
| a4 | 2008 | auto | av |
| a4 quattro | 1999 | manual | m5 |
| a4 quattro | 1999 | auto | l5 |
| a4 quattro | 2008 | manual | m6 |
| a4 quattro | 2008 | auto | s6 |
| a6 quattro | 1999 | auto | l5 |
| … | … | … | … |
separate is unitef %>% unite(trans, type, detail, sep = "_")
| model | year | type | detail |
|---|---|---|---|
| a4 | 1999 | auto | l5 |
| a4 | 1999 | manual | m5 |
| a4 | 2008 | manual | m6 |
| a4 | 2008 | auto | av |
| a4 quattro | 1999 | manual | m5 |
| a4 quattro | 1999 | auto | l5 |
| a4 quattro | 2008 | manual | m6 |
| a4 quattro | 2008 | auto | s6 |
| a6 quattro | 1999 | auto | l5 |
| … | … | … | … |
| model | year | trans |
|---|---|---|
| a4 | 1999 | auto_l5 |
| a4 | 1999 | manual_m5 |
| a4 | 2008 | manual_m6 |
| a4 | 2008 | auto_av |
| a4 quattro | 1999 | manual_m5 |
| a4 quattro | 1999 | auto_l5 |
| a4 quattro | 2008 | manual_m6 |
| a4 quattro | 2008 | auto_s6 |
| a6 quattro | 1999 | auto_l5 |
| … | … | … |
dw %>% gather(type, mpg, cty, hwy)
| model | displ | trans | cty | hwy |
|---|---|---|---|---|
| a4 | 2 | m6 | 20 | 31 |
| a4 | 2 | av | 21 | 30 |
| a4 | 3.1 | av | 18 | 27 |
| a4q | 2 | m6 | 20 | 28 |
| a4q | 2 | s6 | 19 | 27 |
| a4q | 3.1 | s6 | 17 | 25 |
| a4q | 3.1 | m6 | 15 | 25 |
| a6q | 3.1 | s6 | 17 | 25 |
| a6q | 4.2 | s6 | 16 | 23 |
| model | displ | trans | type | mpg |
|---|---|---|---|---|
| a4 | 2.0 | m6 | cty | 20 |
| a4 | 2.0 | av | cty | 21 |
| a4 | 3.1 | av | cty | 18 |
| a4q | 2.0 | m6 | cty | 20 |
| a4q | 2.0 | s6 | cty | 19 |
| a4q | 3.1 | s6 | cty | 17 |
| a4q | 3.1 | m6 | cty | 15 |
| a6q | 3.1 | s6 | cty | 17 |
| a6q | 4.2 | s6 | cty | 16 |
| a4 | 2.0 | m6 | hwy | 31 |
| a4 | 2.0 | av | hwy | 30 |
| a4 | 3.1 | av | hwy | 27 |
| a4q | 2.0 | m6 | hwy | 28 |
| … | … | … | … | … |
dl %>% spread(type, mpg)
| model | displ | trans | type | mpg |
|---|---|---|---|---|
| a4 | 2.0 | m6 | cty | 20 |
| a4 | 2.0 | av | cty | 21 |
| a4 | 3.1 | av | cty | 18 |
| a4q | 2.0 | m6 | cty | 20 |
| a4q | 2.0 | s6 | cty | 19 |
| a4q | 3.1 | s6 | cty | 17 |
| a4q | 3.1 | m6 | cty | 15 |
| a6q | 3.1 | s6 | cty | 17 |
| a6q | 4.2 | s6 | cty | 16 |
| a4 | 2.0 | m6 | hwy | 31 |
| a4 | 2.0 | av | hwy | 30 |
| a4 | 3.1 | av | hwy | 27 |
| a4q | 2.0 | m6 | hwy | 28 |
| … | … | … | … | … |
| model | displ | trans | cty | hwy |
|---|---|---|---|---|
| a4 | 2 | av | 21 | 30 |
| a4 | 2 | m6 | 20 | 31 |
| a4 | 3.1 | av | 18 | 27 |
| a4q | 2 | m6 | 20 | 28 |
| a4q | 2 | s6 | 19 | 27 |
| a4q | 3.1 | m6 | 15 | 25 |
| a4q | 3.1 | s6 | 17 | 25 |
| a6q | 3.1 | s6 | 17 | 25 |
| a6q | 4.2 | s6 | 16 | 23 |
library(dplyr)
library(tidyr)
data(mpg, package = "ggplot2")
## # A tibble: 5 × 2
## sid name
## <dbl> <chr>
## 1 100 Ann
## 2 101 Bob
## 3 102 Cam
## 4 103 Dee
## 5 104 Els
## # A tibble: 7 × 3
## sid grade course
## <dbl> <dbl> <chr>
## 1 100 8.0 A94
## 2 101 6.5 A94
## 3 103 7.0 A94
## 4 100 9.0 B90
## 5 103 5.5 B90
## 6 102 7.5 C14
## 7 90 7.0 C14
inner_join(students, grades)
## Joining, by = "sid"
## # A tibble: 6 × 4
## sid name grade course
## <dbl> <chr> <dbl> <chr>
## 1 100 Ann 8.0 A94
## 2 100 Ann 9.0 B90
## 3 101 Bob 6.5 A94
## 4 102 Cam 7.5 C14
## 5 103 Dee 7.0 A94
## 6 103 Dee 5.5 B90
sid exists in both tables so is assumed to be a key columnstudents %>% inner_join(grades)
students %>% inner_join(grades, by = "sid")
students %>% left_join(grades)
## Joining, by = "sid"
## # A tibble: 7 × 4
## sid name grade course
## <dbl> <chr> <dbl> <chr>
## 1 100 Ann 8.0 A94
## 2 100 Ann 9.0 B90
## 3 101 Bob 6.5 A94
## 4 102 Cam 7.5 C14
## 5 103 Dee 7.0 A94
## 6 103 Dee 5.5 B90
## 7 104 Els NA <NA>
students %>% right_join(grades)
## Joining, by = "sid"
## # A tibble: 7 × 4
## sid name grade course
## <dbl> <chr> <dbl> <chr>
## 1 100 Ann 8.0 A94
## 2 101 Bob 6.5 A94
## 3 103 Dee 7.0 A94
## 4 100 Ann 9.0 B90
## 5 103 Dee 5.5 B90
## 6 102 Cam 7.5 C14
## 7 90 <NA> 7.0 C14
students %>% full_join(grades)
## Joining, by = "sid"
## # A tibble: 8 × 4
## sid name grade course
## <dbl> <chr> <dbl> <chr>
## 1 100 Ann 8.0 A94
## 2 100 Ann 9.0 B90
## 3 101 Bob 6.5 A94
## 4 102 Cam 7.5 C14
## 5 103 Dee 7.0 A94
## 6 103 Dee 5.5 B90
## 7 104 Els NA <NA>
## 8 90 <NA> 7.0 C14
install.packages("nycflights13")
library(nycflights13)
library(readr)
readr::read_csv instead of base::read.csvinstall.packges("readxl", dependencies = TRUE)
library(readxl)
install.packages("haven", dependencies = TRUE)
library(haven)
install.packages('twitteR', dependencies = TRUE)library(twitteR)
setup_twitter_oauth("your_consumer_key", "your_consumer_secret",
"your_access_token", "your_access_secret")
searchTwitter(searchString = "#hashtag", n = 100, lang = "en",
since = NULL, until = NULL, locale = NULL, geocode = NULL,
sinceID = NULL, maxID = NULL, resultType = "recent",
retryOnRateLimit = 120)
rvest is installed:install.packages("rvest", dependencies = TRUE)
readr is installed:install.packages("readr", dependencies = TRUE)
$ cd "name of your git workspace folder goes here"
$ git clone "url to your colleague's github repository"
.Rproj file and try to knit their R Markdown file